Clustering in Newspaper Pages

نویسندگان

  • Marco Aiello
  • Andrea Pegoretti
چکیده

In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experimentation on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as the preferred choice to solve the textual clustering problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Textual Article Clustering in Newspaper Pages

In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experiment on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as the pre...

متن کامل

An Architecture for Efficient News Items Clustering and Retrieval Based on Language Models for a Dynamic Collection of E- Newspapers

Newspaper pages comprises of multiple individual articles divided into multiple columns. The challenging part of this task is to organize and integrate article blocks in the newspaper. This paper proposes a novel approach for Article reconstruction from newspapersincluding an aggregation of multiple sections of article and reading order recovery of each individual article.Thus,the process combi...

متن کامل

Reflection of Knowledge and Information Science’s News in the Press: A Case Study of Iran Newspaper

Background and Aim: The present study aims to explore the coverage and reflection of Knowledge and Information Science news in the Iranian press. Iran Newspaper which is one of the main public newspapers in the country has been selected as the case for this study. Method: This study used content analysis as its research methodology and adopted an inductive approach in data analysis. All the pag...

متن کامل

Finding Community Base on Web Graph Clustering

Search Pointers organize the main part of the application on the Internet. However, because of Information management hardware, high volume of data and word similarities in different fields the most answers to the user s’ questions aren`t correct. So the web graph clustering and cluster placement in corresponding answers helps user to achieve his or her intended results. Community (web communit...

متن کامل

Automatic Layouting of Personalized Newspaper Pages

Layouting items in a 2D-constrained container for maximizing item value and minimizing wasted space is a 2D Cutting and Packing problem. We consider this task in the context of layouting news articles on fixed-size pages in a system for delivering personalized newspapers. We propose a grid-based page structure where articles can be laid out in different variants for increased flexibility. In ad...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004